Import the data
NOC Year Decade ID First.Name Name Last.Name Sex Age
1 AFG 1960 1960s 59346 Mohammad Mohammad Asif Khokan Khokan M 24
2 AFG 1960 1960s 59043 Faiz Faiz Mohammad Khakshar Khakshar M 18
3 AFG 1960 1960s 109486 Abdul Abdul Hadi Shekaib Shekaib M 20
Height Weight BMI BMI.Category Team Population GDP GDPpC
1 171 78 26.67487 3 Afghanistan 8996973 537777800 59.77319
2 162 52 19.81405 0 Afghanistan 8996973 537777800 59.77319
3 178 68 21.46194 2 Afghanistan 8996973 537777800 59.77319
Games Season City Sport Event
1 1960 Summer Summer Roma Wrestling Wrestling Men's Middleweight, Freestyle
2 1960 Summer Summer Roma Wrestling Wrestling Men's Flyweight, Freestyle
3 1960 Summer Summer Roma Athletics Athletics Men's 100 metres
Medal Medal.No.Yes
1 No Medal 0
2 No Medal 0
3 No Medal 0
[ reached 'max' / getOption("max.print") -- omitted 3 rows ]
'data.frame': 151977 obs. of 24 variables:
$ NOC : Factor w/ 122 levels "AFG","ALB","AND",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Year : int 1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ...
$ Decade : Factor w/ 6 levels "1960s","1970s",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : int 59346 59043 109486 59102 128736 29626 39922 106372 128736 58364 ...
$ First.Name : Factor w/ 14118 levels "","A","A.","Aadam",..: 8716 3731 64 599 64 11978 64 4634 64 8716 ...
$ Name : Factor w/ 74268 levels " Gabrielle Marie \"Gabby\" Adcock (White-)",..: 48941 19066 218 3341 220 64832 215 23793 220 48946 ...
$ Last.Name : Factor w/ 47370 levels "","-)","-Alard)",..: 23228 23112 38893 23137 44908 13260 16633 37860 44908 22890 ...
$ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
$ Age : int 24 18 20 35 20 28 22 23 20 20 ...
$ Height : int 171 162 178 166 179 168 172 170 179 166 ...
$ Weight : num 78 52 68 66 75 73 70 58 75 62 ...
$ BMI : num 26.7 19.8 21.5 24 23.4 ...
$ BMI.Category: Factor w/ 5 levels "0","1","2","3",..: 4 1 3 3 3 4 3 3 3 3 ...
$ Team : Factor w/ 332 levels "Acipactli","Afghanistan",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Population : int 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 ...
$ GDP : num 5.38e+08 5.38e+08 5.38e+08 5.38e+08 5.38e+08 ...
$ GDPpC : num 59.8 59.8 59.8 59.8 59.8 ...
$ Games : Factor w/ 30 levels "1960 Summer",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Season : Factor w/ 2 levels "Summer","Winter": 1 1 1 1 1 1 1 1 1 1 ...
$ City : Factor w/ 29 levels "Albertville",..: 19 19 19 19 19 19 19 19 19 19 ...
$ Sport : Factor w/ 51 levels "Alpine Skiing",..: 51 51 3 51 3 51 3 3 3 51 ...
$ Event : Factor w/ 489 levels "Alpine Skiing Men's Combined",..: 478 468 17 476 33 482 22 24 18 466 ...
$ Medal : Factor w/ 4 levels "Bronze","Gold",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Medal.No.Yes: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
There is clearly an upward trend, but no seasonal pattern. The data is also a little choppy at the beginning. Part of the explanation is that the data points are not evenly spaced. Most Olympic games are 4 years apart, but a few of them are just 2 years apart, and during World War I and World War II there were 8-year and 12-year gaps, respectively. Since time series data should be evenly spaced over time, we’ll only look at data from 1948 on, when the Olympics started being held every 4 years without any interruptions.


Creating models
I’m going to try 4 different models.
\[
y_{\text{linear}}(x) = ax+b \\
y_{\text{quadratic}}(x) = ax^2 + bx + c \\
y_{\text{exponential}}(x) = a\exp(bx) + c \\
y_{\text{cubic}}(x) = ax^3 + bx^2 + cx + d
\]
And I’ll be able to use ANOVA to test the nested models: linear vs quadratic, and exponential growth vs s-curve (sigmoid).
Now I will try the model fits on the number of events per Olympic Games data.
|
Res.Df
|
Res.Sum Sq
|
Df
|
Sum Sq
|
F value
|
Pr(>F)
|
|
17
|
33.33158
|
NA
|
NA
|
NA
|
NA
|
|
16
|
30.41244
|
1
|
2.919136
|
1.535759
|
0.2331213
|
|
16
|
32.28073
|
0
|
0.000000
|
NA
|
NA
|
|
15
|
29.04971
|
1
|
3.231026
|
1.668361
|
0.2160269
|
Sports
Year Mean_Weight StdDev_Weight Mean_Height StdDev_Height Sport Sex
1 1924 64.00000 0.000000 167.0000 0.000000 Swimming F
2 1956 61.00000 4.780914 169.7333 3.634491 Swimming F
3 1960 62.73469 5.619073 169.3469 6.839076 Swimming F
4 1964 63.06000 6.466270 171.3600 4.378799 Swimming F
5 1968 62.45455 5.361348 170.3636 4.583033 Swimming F
6 1972 60.23611 5.491333 170.3889 4.949194 Swimming F
'data.frame': 339 obs. of 7 variables:
$ Year : int 1924 1956 1960 1964 1968 1972 1976 1980 1984 1988 ...
$ Mean_Weight : num 64 61 62.7 63.1 62.5 ...
$ StdDev_Weight: num 0 4.78 5.62 6.47 5.36 ...
$ Mean_Height : num 167 170 169 171 170 ...
$ StdDev_Height: num 0 3.63 6.84 4.38 4.58 ...
$ Sport : Factor w/ 10 levels "Basketball","Canoeing",..: 9 9 9 9 9 9 9 9 9 9 ...
$ Sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...


|
Medal
|
mean
|
|
Bronze
|
25.55859
|
|
Gold
|
25.28269
|
|
No Medal
|
24.93049
|
|
Silver
|
25.48383
|
# A tibble: 6 x 3
# Groups: Year [3]
Year Sex mean.Age
<int> <fct> <dbl>
1 1960 F 21.6
2 1960 M 26.0
3 1964 F 21.5
4 1964 M 25.7
5 1968 F 20.5
6 1968 M 25.1
